Effective social spider optimization algorithms for distributed assembly permutation flowshop scheduling problem in automobile manufacturing supply chain

This paper presents a novel distributed assembly permutation flowshop scheduling problem (DAPFSP) based on practical problems in automobile production. Different from the existing research on DAPFSP, this study considers that each component of the final product is composed of more than one part. Components are processed in a set of identical components manufacturing factories and are assembled into products in the assembly factory. The integration of manufacturing processes is an important objective of Industry 4.0. For solving this problem with the minimum makespan criterion, we introduce a three-level representation and a novel initialization method. To enhance the search ability of the proposed algorithms, we design three local search methods and two restart procedures according to characteristics of the problem. Then, by incorporating the problem specific knowledge with the social spider optimization algorithm (SSO), we propose three SSO variants: the SSO with hybrid local search strategies (HSSO), the HSSO with restart procedures (HSSOR), and the HSSOR with self-adaptive selection probability (HSSORP). Finally, 810 extended instances based on the famous instances are used to test the proposed algorithms. In most cases, HSSOR performs the best, with an average comparison metric value of 0.158% across three termination conditions, while the average comparison metric value for the best comparison method is 2.446%, which is 15.481 times that of HSSOR. Numerical results demonstrate that the proposed algorithms can solve the problem efficiently.


Problem definition
This section briefly illustrates the presented DAPFSP similar to examples reported by Hatami et al. 12 and Pan et al. 24 .There are one assembly factory and a set of F identical components manufacturing factories (See in Fig. 1).Under the Make-To-Order (MTO) mode of automobile production, the customer places an order for H products to the assembly factory, and then the assembly factory orders components from F components manufacturing factories according to the customer's order.Each final product is composed of several components and each component can be manufactured in any of the F components manufacturing factories.Each component manufacturing factory f ( f = 1, . . ., F ) is formed by the same production stage and an assembly machine.The production stage is a flowshop and consists of m machines.In the flowshop, every part must be processed by m machines in sequence, i.e., M 1 , M 2 , and so on until M m .All parts belonging to one component must be processed continuously in one factory.In the assembly stage of one component manufacturing factory, all parts belonging to the same component are assembled on the assembly machine M A .Finished components are transported to the final assembly factory.Finally, H products {P 1 , P 2 , . . ., P H } are assembled on the final assembly line in the assembly factory and delivered to customers within the delivery date.

Problem formulation
Following assumptions are used to formulate the considered DAPFSP: • All parts of each component are available to be processed at time zero.
• One component must be processed in one components manufacturing factory.Changing factory is not allowed.• The transportation time within and between factories is ignored.
• The assembly operation of one component can be performed if all parts belong to the component are pro- cessed.• When all components of a product have been processed, the assembly operation for that product can begin.Binary variables: The mathematical model of the presented DAPFSP can be formulated as follows: is assembled immediately after component c ′ on the assembly machine 0 otherwise H p,p ′ = 1 if product p is assembled immediately after product p ′ on the final assembly factory 0 otherwise www.nature.com/scientificreports/Formulation (1) illustrates the objective is minimizing the makespan of all products.Constraint sets ( 2) and (3) mandate that every part has only one immediate successor and one immediate predecessor.Constraint set (4) ensures that each component must be assigned to only one factory.Constraint set (5) enforces that the parts belonging to the same component should be processed in succession and cannot be separated.Constraint set (6) makes sure that a part cannot start until the previous operation is completed.Constraint set (7) indicates that one machine can only process a part at a time.Constraint set (8) guarantees that the assembly operation of a component can only be started after all parts belonging to it are completed.Constraint set (9) enforces that the assembly machine cannot assemble two components at the same time.Constraint set (10) requires that one product cannot be assembled in the final assembly factory before all the components of the product are completed.Constraint set (11) indicates that the final assembly factory cannot assemble two products at a time.Constraint set (12) defines the makespan of all products.Constraint sets ( 13)-( 16) define the domain of decision variables.

The canonical SSO algorithm
The concept and theory of the social spider optimization algorithm was developed by Cuevas and Cienfuegos 44 which mathematically models the behaviour of social spiders.The validity of SSO has been proved in many different optimization problems [45][46][47] .
The search space of SSO is regarded as a communal web.Social spiders use the communal web as a communication media to perform cooperative behavior 48 , such as mating, prey capturing and social contact 49 .Different from most of existing swarm algorithms, the SSO divides the social spider population into two different groups: the female group and the male group.According to gender, each social spider is executed by a different set of evolution operators.Moreover, SSO adopts mating operator to exchange information among two groups and produce offspring.These operators enhance the search ability of SSO to find the optimal solution.In this way, the SSO can avoid premature convergence and keep the balance between exploration and exploitation.

The proposed three social spider optimization algorithms
The canonical SSO cannot be used directly for discrete optimization problem because it is designed to solve continuous optimization problem.Therefore, this section presents three discrete SSO algorithms for the proposed DAPFSP in this paper.We first introduce the solution representation, the initialization method, cooperative operators, three neighborhood operators, three local search methods, two restart producers, and present three discrete SSO algorithms in detail.

Solution representation
Solution representation is an important part in the design of scheduling algorithms.We design a three-level representation for the DAPFSP presented in this paper.The representation includes three parts: a product sequence, α component sequences and β part sequences.The product sequence π introduce the processing sequence of all the products in the assembly factory.In components manufacturing factories, the components belonging to products are assigned according to the product sequence.In other words, the components of the first product in the sequence are assigned first.And then, the components belonging to the second product in the sequence are considered.The part sequence of a component is a permutation of all the parts belonging to the component.Suppose a solution s, its product sequence can be expressed as π = {π 1 , π 2 , . . ., π α } ( π 1 : the first product of the product sequence).The component sequence of product where n π k is the number of components belonging to product π k .The part sequence of component , where u � π k ,h is the number of parts belonging to component � π k ,h .An illustrative example is presented in "An illustrative example" section.The product sequence in the assembly factory is {1, 3, 2}.Three component sequences are 1 = {1, 2} , 2 = {3, 4} and 3 = {5, 6} .Six part sequences are

Initial population
The initial population has an important effect on the quality of the solution for swarm intelligence algorithms.In order to improve the quality of the initial population, there are three interdependent decisions need to be dealt with for the presented DAPFSP: (1) determination of product sequence in the assembly factory; (2) allocation of components to components manufacturing factories and determination of components sequence for each product; (3) determination of part sequence for each component.
Inspired by the pioneering work of Naderi and Ruiz 11 , we assigned each component to the factory which has the minimum makespan among components manufacturing factories when including the component.To maintain the diversity of the population and save computation time, the product sequence and the part sequence for each component are generated randomly.

Population division and weight assignation
The SSO divides the social spider population (denoted as S ) into two different groups: female spiders (denoted as F ) and male spiders (denoted as M ).In the social spider population, the number of female spiders (denoted as N f ) outnumber male spiders (denoted as N m ). and N f usually make up 65-90% of the population size (denoted as N ).In this paper, the female ratio is set to 70%, as in paper 45 .Therefore, N f is calculated according to the following formula: where the mathematical symbol | | can take an integer as the number of female spiders.The number of male spiders N m = N − N f .The population S = {s 1 , s 2 , . . ., s N } is composed of a set of female spiders ( www.nature.com/scientificreports/ In the proposed three SSO algorithms, the solution quality of the individual (spider) i is measured by a weight w i .The following equation is used to calculated the weight w i .

Cooperative operators
The search space of the SSO is assumed as a communal web, in which social spiders update positions.In the communal web, female and male social spiders update positions according to different cooperative operators.

Female cooperative operator
Female spiders have two behavior patterns for updating their positions: attraction movement and repulsion movement.The selection of the final behavior pattern for the female spider f i is influenced by three factors: the vibration V c,i perceived by f i form the nearest spider (denoted as s c ) with a better weight; the vibration V b,i per- ceived by f i form the best spider (denoted as s b ) of the population; and a stochastic movement.The movement of the female spider f i can be defined as follows: where f new,i is the newly generated position of the female spider f i .r 1 , r 2 , r 3 , r 4 and r 5 are five random numbers between [0, 1] .PF is a threshold which determines the selection of attraction movement or repulsion movement.In this paper, PF is set to 0.7 for three proposed SSO algorithms.w c and w b are weights of the nearest spider and the best spider, respectively.The notation || • || represents the Euclidean distance.The final random item r 3 (r 4 − 0.5) donates a stochastic movement.

Male cooperative operator
The male spiders are divided into dominant members and non-dominant members according to weight.m m is the male spider whose weight is the median value of the male population.A male spider with a weight greater than the median value is assigned to dominant members; otherwise, it is regarded as one non-dominant member.Dominant male spiders tend towards the nearest female member (denoted as f n ) to generate a new generation.On the contrary, non-dominant male spiders move towards the center of the male group to utilize the remaining resources.The male cooperative operator can be formulated as follows: where m new,i is the newly generated position of the male spider m i .r 1 , r 2 and r 3 are random numbers between [0, 1] .V n,i is the vibration perceived by m i form the nearest female spider f n .w i , w m and w a is the weight of m i , m m and m a , respectively.W mean is the weighted mean of male spiders.w n is the weight of the nearest female spider f n .
The male spiders are sorted descending by their weight.The sequence after permutation is M = {m 1 , m 2 , . . ., m m , . . ., m N f } .The movement of dominant male spiders and non-dominant male spiders are modeled by formulas (20-1) and (20-2), respectively.

Mating operator
Dominant male spiders and female spiders perform mating operator in the communal web.The radius (denoted as r ) limits the mating range of each dominant male spider.For one dominant male spider m k , it finds out a set of female spiders (denoted as E k ) within the mating range.The mating operator is performed in set T k = E k ∪ m k .If E k is a non-empty set, the mating operator is executed to generate a new individual m new ; otherwise, the mating operator is not performed.The radius is calculated by the following formula: where s max and s min represent the boundary of search space.Recall the presented solution representation, there are s max = {p, p − 1, p − 2, . . ., 1} and s min = {1, 2, . . ., p} .ratio ∈ (0, 1) is a factor controlling the radius r.
If the new individual m new is generated, compare its weight with the weight of the worst individual in the population.If the new individual m new is better than the worst individual, the worst individual is replaced by m new , and m new inherits the gender and index of the replaced individual.Otherwise, the new individual m new is ignored and the population remain unchanged.

Neighborhood operators and local search strategies
Neighborhood operators play an important role in designing a heuristic algorithm and it can enhance the exploitation ability of the algorithm.According to the representation, there are three types of neighborhoods.The first one is based on the product sequence, the second type is based on the component sequence, the third kind is based on the part sequence.For one solution, we can shift its product sequence and keep the other two types of sequences unchanged.Shift operators are widely used in scheduling problems.A shift operator moves one element in a sequence to another location and keeps the other elements in the same position.Considering a product sequence π = π 1 , π 2 , . . ., π k , π k+1 , . . ., π α , the first product π 1 can be shifted to the k th position and generating a new sequence π new = π 2 , . . ., π k , π 1 , π k+1 , . . ., π α , while keeping the component sequence of each product and part sequence of each component unchanged.The same shift operator applies to the component-based neighborhood and part-based neighborhood.
According to the presented three neighborhood operators, we design three local search strategies.The first local search strategy is designed based on the neighborhood operator of the produce sequence.For an individual s i with a product sequence π = π 1 , π 2 , . . ., π k , π k+1 , . . ., π α , the product π k can be inserted into α − 1 possible positions and generate α − 1 different neighborhood solutions.s * i is the best solution of all the α − 1 neighbor- hood solutions.If s * i is better than s i , replace s i with s * i .Otherwise, s i remains unchanged.The above process is performed for each product in the product sequence.The procedure of product-based local search strategy is presented in Algorithm 1.

Algorithm 1 Product-based local search
The second lo cal search strateg y is designed based on comp onent sequences.
where n π k is the number of components belonging to product π k .The component � π k ,h can be inserted into n π k − 1 possible positions and produce n π k − 1 different neighborhood solutions.s * i is the best solution of all the n π k − 1 neigh- borhood solutions.If s * i is better than s i , replace s i with s * i .Otherwise, s i remains unchanged.The above process is performed for each component of the product π k .The procedure of component-based local search strategy is illustrated in Algorithm 2.
, where u � π k ,h is the number of parts belonging to component � π k ,h .The part δ � π k ,h ,j can be inserted into u � π k ,h − 1 possible positions and can produce u � π k ,h − 1 different neighborhood solutions.s * i is the best solution of all the u � π k ,h − 1 neighborhood solutions.If s * i is better than s i , replace s i with s * i .Otherwise, s i remains unchanged.The above process is performed for each part of the component � π k ,h .The procedure of part-based local search strategy is shown in Algorithm 3.
The minimum makespan of each iteration is stored.If the minimum is not improved in ten consecutive iterations, we adopt the following OBR procedure to increase population diversity.
Step 1 Sort all spiders in the population in descending order of weight.
Step 2: Take the top 10% of individuals and calculate their opposite positions based on the sequence of products.
Step 3 An opposite position is accepted if the makespan is better, while a poor opposite position (denoted as π ) is accepted according to the following probability p π:  www.nature.com/scientificreports/ Step 4 Among the remaining 90% individuals, randomly select the same number of individuals as the opposite positions, and replace the selected individuals with the opposite positions.

Inverse-based restart procedure
Since the permutation of components and parts is not a set of consecutive integers, the OBL cannot be used directly.For the component-based sequence and part-based sequence, we apply an inverse operation to generate new permutations.The procedure of inverse operator is shown as follows: (a) randomly select two points P 1 and P 2 of the permutation; (b) reverse the sequence between P 1 and P 2 .Figure 3 presents an instance of the inverse operator.The procedure of inverse-based restart (IBR) is shown as follows: Step 1 Sort all spiders in the population in descending order of weight.
Step 2 Take the top 10% of individuals and calculate their inverse neighborhoods based on the sequence of components and parts.
Step 3 An inverse position is accepted if the makespan is better, while a poor inverse position (denoted as π ) is accepted according to the formula (24).
Step 4 Among the remaining 90% individuals, randomly select the same number of individuals as the inverse neighborhoods, and replace the selected individuals with the opposite positions.

SSO with hybrid local search strategies
Three local search strategies mentioned above are quite different, and they can enhance the search ability of discrete SSO.In order to make full use of the advantages of these strategies and obtain the optimal solution or the near optimal solution, we propose a hybrid SSO (HSSO for short) that adopts these strategies simultaneously.The procedure of HSSO is illustrated in Algorithm 4.

HSSO with restart procedures
With the evolution of HSSO, an increasing number of individuals have a tendency to converge together, so the algorithm may be trapped in local optimal.To maintain the diversity of the population and avoid premature convergence, we present the HSSO with two restart procedures (HSSOR for short).The procedure of HSSOR is shown in Algorithm 5.

Experimental design
Since the DAPFSP presented in this paper is an extension of the innovative work of Hatami et al. 12,13 and Pan et al. 24 , the instances of the proposed problem are extended based on the 810 large instances of Hatami et al. 12 .We set instances with the number of parts {100, 200, 500}, the number of machines in the production stage M ∈{5, 10, 20}, the number of components manufacturing factories F ∈{4, 6, 8}, the number of components β ∈{30, 40, 50}.To satisfy the assumption that each product is composed of at least two components and limit the number of instances, we assign 30 components, 40 components and 50 components to 10 products, 15 products and 20 products, respectively.For each instance, all time indexes are set to integers.The processing time of each part in different product stages is fixed to U [1, 99].The assembly time of each component is set to where n is the number of parts belonging to the component.The assembly time of each product in the assembly factory is set to where n is the number of components belonging to the product.There are 3 × 3 × 3 × 3 = 81 different combinations and each combina- tion has 10 replications.The 810 instances zip file is available.For example, the notation "I_100_5 _4_30_10_1" means an instance consists of 100 parts, 5 machines, 4 components manufacturing factories, 30 components and 10 products.The last number '1' of the notation indicates that the instance is first one of the combination.
The relative percentage deviation (RPD) is used to measure solutions obtained by different algorithms.The formula for calculating RPD is as follows: where is C max the makespan calculated by one algorithm, and C best is the minimum makespan obtained by all comparing algorithms.

Experimental calibration
In this subsection, we calibrate the proposed three SSO variants and comparing algorithms.This study is the first work to solve the DAPFSP with the assumption that each component of the final product is composed of several parts.This kind of problem is widespread in the actual production process, especially in auto parts supply chain, but there is no published works to make comparisons.To verify the validity of the proposed three algorithms, we conduct a comparison with two excellent work on the distributed flowshop scheduling problem including CMA 17 and EDA 19 .By incorporating the presented solution representation and the above-mentioned objective function, we adopt CMA and EDA to the proposed DAPFSP.
The parameters of the algorithm affect its performance.To calibrate these five algorithms, we generate an instance for each combination randomly and employ Taguchi method 57 .To set parameters effectively, we conducted preliminary experiments.For the HSSORP, the population size P s , the levels of the self-adaptive selection probability P sa ( ρ min -ρ max ), and the parameter ratio of mating operator, are shown in Table 2.
We use C++ o code the HSSOPR in VS 2015 and the elapsed CPU time is set to 20 × χ × M milliseconds.For three critical parameters, we chose the orthogonal array L 16 (4 3 ) that test 16 combinations of different parameter levels.The HSSOPR runs 20 times independently for each combination on a computer with 8 Intel(R) Core (TM) i5-10210U CPU @ 1.60GHz, 8 GB RAM and Windows 10 operation system.The average RPD (aRPD) values is calculated and presented in Table 3.The parameter level trend is shown in Fig. 4. According to Fig. 4, HSSORP performs better with the following parameters: P s = 100, ρ min = 0.2, ρ max = 0.8, ratio = 0.6.The other four algo- rithms are calibrated by the same method, and Table 4 details parameters of all five algorithms.

Computational evaluation
We code all the algorithms using C++ in VS 2015 and solve the 810 instances on the above-mentioned computer.To make the experiment fair, instead of using the number of iterations as the termination condition, we let each algorithm run independently for the same time.Termination is triggered when the algorithm reaches the maximum elapsed CPU time t = τ × χ × M milliseconds and the parameter τ is set to is set to three values 20, 40, and 60 respectively.In the given three different termination times, each algorithm runs 20 times independently for 810 instances.The best solutions of 810 instances calculated by each algorithm can be accessed.Tables 5, 6 and 7 presents the summarized aRPD values of three terminations.
Table 5 reports the aRPD values with CPU time t = 20 × χ × M milliseconds.It can be seen from the last row that HSSOR and HSSORP performs better than HSSO, which proves the effectiveness of the presented restart procedures.There is only a small difference in the performance of HSSOR and HSSORP.EDA gets the worst performance with 12.248% aRPD, which is almost 65 times that of HSSOR.This is because the search method (       We analyze the results closely.EDA is taken out of the analysis because it is not suitable for the proposed problem.Figures 5, 6 and 7 present box plots for four comparison algorithms at three diffident CPU times.It can be clearly observed that: (1) statistically, our algorithms are much better than CMA, and this can prove that our strategies and algorithms is effective; (2) HSSOR and HSSORP is better than HSSO, and this can illustrate the effectiveness of the proposed restart procedures; (3) HSSOR is slightly better than HSSORP, and this can show that using three local search strategies simultaneously can achieve better results.
In addition, five parameters including the number of parts χ, the number of components β, the number of machines M, the number of factories F, and the number of products α determine the proposed problem.We conduct data analysis to figure out the effects of these parameters on the four comparison algorithms.Figure 8 shows variation trend of aRPD for different value levels of five parameters.It can be seen from Fig. 8 that both of HSSOR and HSSORP have very slight fluctuations and small values which illustrate that both of them have excellent robustness and performance.According to the data, all values of HSSOR for each parameter are smaller than HSSORP, which indicates HSSOR is better than HSSORP.
Finally, to illustrate the proposed DAPFSP intuitively, Fig. 9 reports the Gantt chart of instance "I_100_5_4_30_10_1" with the best solution we found so far.

Conclusions
In conclusion, our study makes several contributions from different perspectives: theoretical, managerial, and practical.(i) Theoretical contribution: This paper presents the first study on DAPFSP considering the intricate nature of modern supply chains, where components of one final product are manufactured by multiple companies.Our proposed problem formulation requires determining the optimal product sequence in the assembly factory, as well as the component sequence and part sequence in multiple component manufacturing factories.
To address this challenge, we introduce a novel three-level representation and an initialization method, along with three local search methods to enhance the algorithm's search ability.Additionally, to prevent premature convergence, we design two restart procedures tailored to the problem's characteristics.Furthermore, we propose three discrete SSO algorithms for the proposed DAPFSP.Our experiments, calibrated using the Taguchi method, demonstrate the efficiency of these algorithms in solving the problem.(ii) Our study provides insights into the critical role of efficient scheduling in supply chain management.By addressing a crucial aspect of production processes, it partially bridges the gap between academic research and practical applications.However, it is important to acknowledge that our research does not encompass all constraints present in practical production scenarios, such as transportation costs 58 , production capacity constraints, setup times, machine maintenance, and rescheduling in emergencies 59 .These aspects are vital considerations for real-world applications and should be incorporated into future research endeavors.(iii) In the view of practical standpoint, our research sets the stage for further exploration into the integration of manufacturing processes in Industry 4.0 60 .Moving forward, we are committed to addressing the aforementioned constraints and designing efficient scheduling algorithms that can significantly enhance production efficiency in practical settings.By focusing on the challenges encountered

Figure 3 . 5 HSSOR
Figure 3.An example of the inverse operator.

Figure 5 .
Figure 5. Box plots for four comparison algorithms at CPU time t = 20 × χ × M ms.

Figure 6 .
Figure 6.Box plots for four comparison algorithms at CPU time t = 40 × χ × M ms.

Figure 7 .
Figure 7. Box plots for four comparison algorithms at CPU time t = 60 × χ × M ms.

Figure 8 .
Figure 8. Variation trend of aRPD for different value levels of five parameters.

Table 4 .
Parameters of five algorithms.
EDA may not be suitable for the proposed problem.CMA performs much better than EDA.Moreover, the proposed three algorithms outperform the comparison algorithms.Tables6 and 7present the aRPD values with CPU time t = 40 × χ × M milliseconds and CPU time t = 60 × χ × M milliseconds respectively.As we can see from two tables, as running time increases, our proposed algorithm can maintain a very large lead in performance.At CPU time t = 40 × χ × M milliseconds, HSSOR and HSSORP are almost equal, while HSSORP outperforms HSSOR at CPU time t = 60 × χ × M milliseconds. of

Table 5 .
aRPD at CPU time t = 20 × χ × M ms.Better results are in bold.

Table 6 .
aRPD at CPU time t = 40 × χ × M ms.Better results are in bold.

Table 7 .
aRPD at CPU time t = 60 × χ × M milliseconds.Better results are in bold.